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Abstract 

Inspired by the trend on unifying theories of programming, this paper 
shows how the algebraic treatment of standard data dependency theory 
equips relational data with functional types and an associated type sys- 
tem which is useful for type checking database operations and for query 
optimization. 

Such a typed approach to database programming is then shown to be 
of the same family as other programming logics such as eg. Hoare logic or 
that of strongest invariant functions which has been used in the analysis 
of while statements. 

The prospect of using automated deduction systems such as Prover9 
for type-checking and query optimization on top of such an algebraic ap- 
proach is considered. 

Keywords: Unifying theories of programming; theoretical foundations; 
data dependencies. 

1 Prelude 

In a paper ad dressing th e influ ence of Alfred Tarski (1901-83) in computer sci- 
ence, Solomon lFeferma"n ( 2006 ) quotes the following statement by his colleague 
John Etchemendy: "You see those big shiny Oracle towers on Highway 101? 
They would never have been built without Tarski's work on the recursive defini- 
tions of satisfaction and truth". 

The 'big shiny Oracle towers' are nothing but the headquarters of Oracle 
Corporation, th e giant da t abase software provider sited in the San Francisco 
Peninsula. Still iFeferman ( 2006 ): "Does Larry Ellison know who Tarski is or 



anything about his work? [...] / learned subsequently from Jan Van den Bussche 
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that [...] he marks the reading of Codd's seminal paper as the starting point 
leadi ng to the Ora cle Corporation. " 



Busschd (|2001l ) had in fact devoted attention to relating Codd and Tarski's 



work: "We conclude that Tarksi produced two alternatives for Codd's relational 
algebra: cylindric set algebra, and relational algebra with pairing [...] For exam- 
ple, we can represen t the ternary relation {(a, b, c), (d, e, /)} as {(a, (&, c)), (d, (e, /))} 
Still iBusschel (l200ll) : 



"Using such representations, we leave it as an exercise to the reader 
to simulate Codd's relational algebra in RA + [relational algebra with 
pairing] ". 

To the best of the author's knowledge, nobody has thus far address ed this 
exercise in a thorough way. I nstead, standard relational database theory ([Maier , 
19831: lAbiteboul et all [l995h includes a well-known relation algebra but this is 
worked out in set theory and quantified logic, far from the objectives of Tarski's 
life-long pursuit in developing methods for elimination of quantifiers from logic 
expressions. An e ffort which ultimately lea d to his formalization of set theory 
without variables (jTarski and Givantl . 119871) . 

The topic has acquired recent interest with the advent of work on imple- 
menting extensions of Tarski's algebra in automated deductio n systems such as 
Prover9 and the associated counterexample generator MaceJ i (jHofner and Struthl . 
20071 ) . This offers a potential for automation which has not been acknowledged 
by the database community. In this context, it is wor th mentioning an early 



concern of the founding fathers of the standard theory (jBeeri et all 119771 ): 



"[A] general theory that ties together dependencies, relations and 
operations on relations is still lacking". 

More than 30 years later, this concern is still justified, as database program- 
ming standards remain in sensitive to techniques such as formal verification and 
extended static checking ( Flanagan et al. . 20021) which are more and more re- 
garded essential to ensuring quality in complex software systems. 

In the remainder of this paper we will see how the algebraic treatment of the 
standard theory along the exercise proposed by Bussche equips relational data 
with functional types and an associated type system which can be used to type 
check database operations. Interestingly, such a typed approach to database 
programmin g will be show n to relate to other programming logics such as eg . 
Hoar e logic ([Hoard . 1969) or that of strongest invariant functions ( Mili et al 



19851) which has been used in the analysis of while statements, for ins tance 



On the whole, the approach has a unifying theories of programming (jHoare and Jifena . 

1998) flavour, even though the exercise is not carried out "avant la lettre" in 
canon ical UTP. A full account can be found in a technical report (jOliveira . 
20111 ). For space constraints, this paper only covers the first part of the ex- 
ercise, that of developing a type system for relational data which stems from 
functional dependencies. 
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Paper structure. Section [5] introduces functional dependencies (FD) and 
shows how to convert the standard definition into the Tarskian, quantifier-free 
style. The parallel between the functions as types approach which emerges from 
such a conversion and a similar treatment of Hoare logic is given in section [3J 
Section 0] shows that, in essence, injectivity is what matters in FDs and gives a 
corresponding, simpler definition of FD which is used in section [5] to re-factor 
the standard theory into a type system of FDs. Section [6] shows how to use 
this type system to type check database operations and section [7] shows how to 
calculate query optimizations from FDs. The last section gives an account of 
related work and concludes with a prospect for future work. 



2 Introducing functional dependencies 

In standard relational data processing, real life objects or entities are recorded 
by assigning values to their observable properties or attributes. A database file 
(vulg. table) is a collection of such attribute assignments, one per object, such 
that all values of a particular attribute (say i) are of the same type (say A{). 
For n such attributes, a relational database file T can be regarded as a set of 
n-tuples, that is, T C A\ X . . . X A n . A relational database is just a collection 
of several such n-ary relations, or tables. 

Attribute names normally replace natural numbers in the identification of 
attributes. The enumeration of all attribute names in a database table, for in- 
stance S = {Pilot, Flight, D ate, Depart s) concerning the airline schedul- 
ing system given as example in (Maier, 1983), is a finite set called the table's 



scheme. This scheme captures the syntax of the data. What about seman- 
tics? Even non-experts in airline scheduling will accept "business rules" such 
as, for instance: a single pilot is assigned to a given flight, on a given date. 
This restriction is an example of a so-called functional dependency (FD) among 
attributes, which can be stated more formally by writing "Flight Date — > 
Pilot" to mean that attribute Pilot is functionally dependent on Flight and 
Date, or that Flight, Date functionally determine Pilot. 

Data dependencies capture the meaning of relational data. Data dependency 
theory involves not only functional dependencies (FD) but also multi-valued 
dependencies (MVD). Both are central to the standard theory, where they are 
addressed in an axiomatic way. iMaier (1983) provides the following definition 
for FD-satisfiability: 

Definition 1 Given subsets x, y C S of the relation scheme S of a n-ary rela- 
tion T, this relation is said to satisfy functional dependency x — > y iff all pairs 
of tuples t,t' G T which "agree" on x also "agree" on y, that is, 

V t,t' : t,t'£T => {t[x]=t'[x\ t[y]=t'[y] ) (1) 

(The notation t[a] in (Q]) means "the value exhibited by attribute a in tuple t".) 
□ 
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How does one express formula ([T]) in Tarski's relation algebra style, getting 
way with the two-dimensional universal quantification and logical implications 
inside? For so doing we need to settle some notation. To begin with, t[x] is better 
written as x(t), where x is identified with the projection function associated to 
attribute set x. Regarding x and y in (JTJ) as such functions we write: 

Vt,t': t,t'eT => ( x(t)=x(t') y(t)=y(t') ) (2) 

Next, we observe that, given a function f : A B, the binary relation 
R C A x A which checks whether two values of A have the same image under 
/0 — that is, a'Ra = /(a') = f(a) — can be written alternatively as a'(f° ■ f)a. 
Here, f° denotes the converse of / (that is, a(f°)b holds iff b = f a) and the 
dot (•) denotes the extension of function composition to binary relations: 

b(R-S)c = 3a:bRaAaSc (3) 

Using converse and composition the rightmost implication of ([2]) can be 
rewritten into t(x° ■ x)t' =>• t(y° ■ y)t' , for all t,t' E T. Implications such as this 
can expressed as relation inclusions, following definition: 

RCS = V b : a:b Ra^b S a (4) 

However, just stating the inclusion x° ■ x C y° ■ y would be a gross error, for the 
double scope of the quantification (t € T A i! € T) would not be taken into 
account. To handle this, we first unnest the two implications of (J2J), 

V t, t' : (t G T A t' eT A t(x° ■ x)t') ^ t(y° ■ y)t' 

and treat the antecedent tETAt'eTA t(x° -x)t' independently, by replacing 
the set of tuples T by the binary relation \T\ defined as follows o 

b[T\a = b = a A a ET (5) 

Note that t ET can be expressed in terms of [T] by 3 u : u = t A t\Tju and 
similarly for t' E T. Then: 

(t E T A t' E T A t(x° ■ x)t') 
= { expansion of t G T and t' ET } 

3u,u':u = t A u' = t' A t\T]u A t'{T\v! A t(x° ■ x)t' 
= { A is commutative; equal by equal substitution; converse } 

]«,«': i[7> A u{x° -x)u' A u'{T\°t' 
= { composition Q twice } 

Km • s° • * ■ my 

1 This is known as the nucleous l|Mili et all Il985h or kernel ijOliveiral 12009) of a function 

/■ 

2 This is a standard way of encoding a set T as a binary relation [T] known as a partial 
identity, since [T] C id. The set of all such relations forms a Boolean algebra which reproduces 
the usual algebra of sets. Moreover, partial identities are symmetric ([T]° = [T]) and such 
that [S]-p1 = [S]n[T]. 
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Finally, by putting this together with t(y° ■ y)t' we obtain 

[T] ■ x° ■ x ■ [T]° Cy°-y (6) 
as a quantifier- free relation algebra expression meaning the same as ([T]). 

Generalization. To reassure the reader worried about the doubtful practi- 
cality of derivations such as the above, we would like to say that we don't need 
to do it over and over again: inequality ©, our Tarskian alternative to the 
original textbook definition ([1]), is all we need for calculating with functional 
dependencies. Moreover, we can start this by actually expanding the scope of 
the definition from sets of tuples [T] and attribute functions (x, y) to arbitrary 
binary relations R and suitably typed functions / and g: 

R • ./' • ./' • R' ' .</ • '/ (7) 

In this wider setting, R can be regarded not only as a piece of data but also 
as the specification of a nondeterministic computation, or even the transition 
relation of a finite-state automaton; and / (resp. g) as a function which observes 
the input (resp. output) of R. Put back into quantified logic, such a wider notion 
of a functional dependency will expand as follows: 

V a', a : f{a') = /(a) (V b', b : b' R a' A bRa => g(b') = g(b)) (8) 

In words: inputs a, a' indistinguishable by f can via R only lead to outputs in- 
distinguishable by g. Notationally, we will convey this interpretation by writing 

R : f — )• g or / — g . We can still say that R satisfies the / — > g FD, in 
particular wherever R is a piece of data. As can be easily checked, f(a') = f(a) 
is an equivalence relation which, in the wider setting, can be regarded as the 
semantics of the datatype which R takes inputs from (think of / : A — > B as a 
semantic function mapping a syntactic domain A into a semantic domain _B), 
and similarly for g concerning the output type. 

Summing up, the functions / and g in Q can be regarded as types for 
R. Some type assertions of this kind will be very easy to check, for instance 
id : f — s> / , just by replacing R, /, g := id, /, / in and simplifying. But type 
inference will be easier to calculate on top of the even simpler (re) statement of 
([7| which is given next. 



3 Functions as types 

Before proceeding let us record two properties of the relational operators con- 
verse and composition [§ 

(R-S)° = S°-R° (9) 
(R°)° = R (10) 

3 It may help to recall the same properties from elementary linear algebra, once converse is 
interpreted as matrix transposition and composition as matrix-matrix multiplication. 



5 



Moreover, it will be convenient to have a name for the relation R° ■ R which, for 
R a function /, is the equivalence relation "indistinguishable by /" seen above. 
We define 

ker R = R° ■ R (11) 

and read ker R as "the kernel of R" . Clearly, a'(ker R)a means 3 b : b R a' A 
b R a and therefore ker R measures the injectivity of R: the larger it is the 
larger the set of inputs which R is unable to distinguish (= the less injective R 
is)- 

We capture this by introducing a preorder on relations which compares their 
injectivity: 

R<S = ker S C ker R (12) 

As an example, take two list functions, elems computing the set of all elements 
of a list, and bagify keeping the bag of such elements. The first loses more 
information (order and multiplicity) than the latter, which only forgets about 
order. Thus elems < bagify. A function / (relation in general) will be injective 
iff ker f C id (id < /), which easily converts to the usual definition: f(a') = 
f(a) => a' = a. 

Summing up: for functions or any totally defined relations R and 5*0, R < S 
means that R is less injective than 5 1 ; for possibly partial R and 5, it will mean 
that R less injective or more defined than S. 

Therefore, for total relations R the preorder is universally bounded, 

! < R < id 

where the infimum is captured by the constant function ! which maps every ar- 
gument to a given (predefined) value, the choice of such value being irrelevant 0. 
The kernel of ! is therefore the largest possible, denoted by T (for "top"). The 
other bound is trivial to check, since ker id — id, this arising from the well- 
known fact that id is the unit of composition. In general, id < R means R is 
injective. 

Equipped with this ordering, we may spruce up our relational characteriza- 

4 A relation R is totally defined (or entire) iff id C ker R. 

5 Note that R < S is a preorder, not a partial order, meaning that two relations indistin- 
guishable with respect to their degree of injectivity can be different. 
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tion of the / >■ g type assertion, or functional dependency (FD): 



= { definition © } 

R-r-f-R°Q9°-9 

= { converses (|9I10I) ; kernel |TT} } 

ker (/ • R°) C ker g 

= { (|12[) : g is "Zess injective than / wrt. J?" } 

9 < f ■ R° 

We thus reach a rather elegant formula for expressing functional dependencies, 
whose layout invites us to actually swap the direction of the arrow notation 
(but, of course, this is just a matter of taste): 

Definition 2 Given an arbitrary binary relation R C A x B and functions 

f : B — > D and g : A — > C ' , given A, B, ... D, the "type assertion" g ■<— — / 
meaning that R satisfies FD / — > g is given by the equivalence: 

9^—f = 9<f-R° (13) 

□ 

Intuitively, g ■< / means that g will be blinder (less injective) to the outputs 

of R than / is concerning its inputs. 

There are two main advantages in definition (|13p. besides saving ink. The 
most important is that it takes advantage of the calculus of injectivity which 
will be addressed in the following section. The other is that it makes it easy to 
bridge with other programming logics, as is seen next. 

Parallel with Hoare logic. As is widely known, Hoare logic is based on 
triples of the form {p}R{q}, with the standard interpretation: "if the assertion 
p is true before initiation of a program R, then the assertion q will be true on 
its completion " (lHoard . ll969l) . 

Let program R be identified with the relation which captures its state transi- 
tion semantics and predicates p (and q) be identified with s'[p]s = s' = s A p(s) 
(similarly for q) — the same trick we used for converting sets to binary relations 
in section [2] (Note how [p] can be regarded as the semantics of a statement 
which checks p(s) and does not change state, failing otherwise.) In relation 
algebra this is captured by d 

{p}R{q} ee rng(R-]p])C[q\ 

6 See ([Oi lvcir a, 2009) and references there to related work. 
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meaning that the outputs of R (given by the range operator rng) for inputs 
pre-conditioned by p don't fall outside q; that is, q is weaker than the strongest 
post-condition sp(R,p), something we can express by writing 

{p}R{q} = q<p-R° (14) 

under a suitable preorder < expressing that q is less constrained than p ■ R° 0. 

In spite of the different semantic context, there is a striking formal similarity 
between formulas (|T4|) and ([13]) suggesting that Hoare logic and the logic we want 
to build for FDs share the same mathematics once expressed in relation algebra. 
Such similarities will become apparent in the sequel, where we are going to write 

p — q for {p}R{q}, to put the notations closer. Using this notation, rules 
such as eg. the rule of composition, {p}Ri{q} A {q}R 2 {r} => {p}Ri,R 2 {r} 
become @: 

Hi Ri Ri;R 2 r , 
p ^q A q s~ r =>■ p s~ r (15) 



We will check the FD equivalent to (fT5|) shortly. 

4 A calculus of inject ivity (<) 

One of the advantages of relation algebra is its easy "tuning" to special needs, 
which we will illustrate b elow concernin g the algebra of injectivity. We give just 



an example, taken from (jOliveiral 120111 ); the reader is referred to this technical 
report for the whole story. 

We start by considering two rules of relation algebra which prove very useful 
in program calculation: 

f-RCS = RCf°-S (16) 
R ■ f° C S = RCS-f (17) 



In th ese equivalences^, which are widely known as shunting rules ( Bird and de Moor 



1997h . / is required to be a (total) function. In essence, they let one trade a 



function / from one side to the other of a C-equation just by taking converses. 
(This is akin to "changing sign" in trading terms in inequations of elementary 
algebra.) 

It would be useful to have similar rules for the injectivity preorder, which we 
have chosen as support for our definition of a FD (|13|) . It turns out that such 



7 Dctails: {p}R{q} is rng(R ■ [p]) C [q], itself the same as dom([p] ■ R°) C o!om[g] since 
dom (domain) and rng (range) commute with converse and the domain of a partial identity 
is itself. The preorder is R < S = dom S C dom R. Parentheses [_] are dropped to make the 
formula lighter to read. 

8 The arrow n otation for Hoa re triples, reminiscent of that of labelled transition systems, 
is adopted in eg. (Olivcira, 2009). 

9 Technically , these equivalences should be regarded as (families of) Galois connections 
jOliveiral [20091 '). 
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rules are quite easy to infer, as is the case of the Galois connection for trading 
a function / with respect to the injectivity preorder given by 

R-f<S = R<S-f° (18) 

which takes just three steps to calculate: 

R ■ f < S 

= { definition <[T2j ; converses (|9I10[) ; kernel |TT} } 

ker S Cf° ■ (ker R) ■ f 
= { shunting rules (|16ll7p } 

/ • ker S ■ f° C ker R 

= { converses, kernel and definition (|12[) again } 

R < S ■ f° 

Let us put this new rule to work for us in the derivation of a trading-rule 
which will enable handling composite antecedent and consequent functions in 
FDs: 

y -« x — y-z^ x ■ k (19) 

Thanks to (fT5)) . the calculation of (fT9"]) is immediate: 

z-R,k° 

y ~* x 

= { definition (|13p ; converses } 

y < x ■ k ■ R° ■ z° 
= { new shunting rule (|18[) } 

y ■ z < (x ■ k) ■ R° 
= { definition (O } 

R , 

y ■ z -< x ■ k 

Another result which will help in the sequel is 

X <RUS = X <R A X <S A R° ■ S C ker X (20) 

where R U S is the union of relations R and For X := id, (J20J) tells that 

RU S is injective iff both R and S are injective and don't "confuse" each other: 
wherever bSa and bRc hold, c = a. 



3 See llOliveiral . 1201 ll) for the proof of this and other results of the algebra of injectivity. 
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5 Building a type system of FDs 



The machinery set up in the previous sections is enough for developing a type 
system wh ereby dependencies , relations and operations on relations are tied 
together, as iBeeri et al.l (|1977l) envisaged. 



Composition rule. FDs on relations which matching antecedent and conse- 
quent functions (as types) compose: 



Proof: 



S-R 

y ** x 



v ■ 



(21) 



h, < g A g -e 

{ CEU) twice } 
h<g-S° A g< f ■ R° 

{ <-monotonicity of ( ■ S°) ; convei 
h<g-S° A g-S° < f ■ (S ■ R)° 

{ <-transitivity } 
h<f-(S-R)° 

{ (Tl3} again } 



® } 



This rule is the FD counterpart of the rule of composition in Hoare logic 
15|) for R and S regarded as describing computations I 11 ! . 



Consequence (weakening/strengthening) rule: 

R , ^- i. ^ . . . R 



h 



k< g A 



f) ■ 



f 



A 



f <h 



(22) 



Proof: See ( Oliveira , 201 lh , where this rule is shown to subsume and generalize 
standard Armstrong axioms F2 (Augmentation) and F4 (Projectivity). In the 
paral lel with Hoare logic, it corresponds to the two rules of consequence ( Hoarel 
1969() which, put together and writing triples as arrows, becomes 



A p^p 



for P a program and p, q etc program assertions. 



11 For _R and S the same databas e table , this rule subsumes Armstrong axiom F5 (Tran- 
sitivity) in the standard FD theory llMaierl Il983l). The cal culation of this and other similar 



results stated in this paper can be found in llOliveiraLl20ll}) 
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Reflexivity. We have seen already that 



/ / (23) 

holds trivially. This rule, which corresponds to the "skip" rule of Hoare logic, 
p ■< — — p , is easily shown to hold for any set T, 

fJ^-f (24) 

as FDs are downward closed. Rule (|2"4")l is known as Armstrong axiom Fl (Re- 
flexivity). 

Note in passing that (|2T|) and (l23l) together define a category whose objects 
are functions (types) and whose morphisms (arrows) are FDs. 

6 Type checking database operations 

Let us proceed to an example of database operation type checking: we want to 
know what it means for the merging of two database files to satisfy a particular 
functional dependency / >■ g . That is, we want to find a sufficient condi- 
tion for the union R U S of two relations R and S to be of type / >■ g . The 

algebra of injectivity does most of the work: 

RUS , 

9* / 

= { definition (|13p ; converse distributes by union } 

g < f ■ (R° U S°) 
= { relational composition distributes through union } 

S</-i?°U/-5° 
= { algebra of injectivity (|20p : definition (|13p again, twice } 

9 f A g f A R ■ ker / • 5° C ker g 



{ introduce "mutual dependency" shorthand } 

R , . S , . R,S , 
/ A g^ / A g^ / 



R S 

The "mutual dependency" shorthand g ^— : — / introduced in the last step 
for R ■ ker / -S° C ker g can be read as a generalization of the standard definition 
of a FD to two relations instead of one — just generalize the second R in © to 
some 5*. For R and S two sets of tuples, it means that grabbing one tuple from 
one set and another tuple from the other set, if they cannot be distinguished by 
/ then they will remain indistinguishable by g. 
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It should be stressed that the bottom line of the calculation expresses not 

only a sufficient but also a necessary condition for g < iUS f to hold, as all 
steps are equivalences. 

Type checking other database operations will follow the s ame scheme. Below 



we handle one particular such operation — relational join (jMaierl . 119831) — in 
detail. This is justified not only for its relevance in data processing but also 
because it brings about other standard FD rules not yet addressed. 



Joining and pairing. Recall from section Q] how Busschd ( 2001 ) explains the 



relevance of Tarski's work on pairing in relation algebra by illustrating how 
a ternary (in general, n-ary) relation {(a, 6, c), (d, e, /)} gets represented by a 
binary one, {(a, (b, c)), (d, (e, /))}. 

Pairing is not only useful for ensuring that sets of arbitrarily long (but finite) 
tuples are represented by binary relations but also for defining the join operator 
(xi) on such sets. In fact, this operator is particularly handy to express in case 
the two sets of tuples are already represented as binary relations R and S: 

{a,b){Rt«S)c = aRcAbSc (25) 

Interestingly, relational join behaves as a least upper bound with respect to the 
injectivity preorder 

Rtx S <T = R <T A S <T (26) 

This combinator turns out to be more general than its use in data pro- 
cessing 0. In particular, when R and S are functions / and g, f H g is the 
obvious function which pairs the outputs of / and g: (/ N g)x = (/ x,g x). 
Think for instance of the projection function f x (resp. f y ) which, in the con- 
text of Definition [1] yields t[x] (resp. t[y]) when applied to a tuple t. Then 
(f x x fv)t = (t \x],t\y]) — t[xy], where xy denotes the union of attributes x and 



(jMaier . 1983). So, attribute union corresponds to joining the corresponding 
projection functions. This gives us a quite uniform framework for handling both 
relational join and compound attributes. To make notation closer to what is 
common in data dependency theory we will abbreviate f x x fy to f x fy and this 
even further to xy, identifying (as we did before) each attribute (eg. x) with the 
corresponding projection function (eg. f x ). 

Minding this abbreviation fg of / M g, for functions, from (|26|) it is easy to 
derive facts ! < / < id and f < fg , g < fg ■ This is consistent 
with the use of juxtaposition to denote "sets of attributes". In this context, < 
can be regarded as expressing "attribute inclusion". In fact, the more attributes 
one observes the more injective the projection function corresponding to such 
attributes is 



12 See details and proof in (Olivcira, 2011) 



13 It is termed split in ((Bird and de Moorl . Il997l ) and fork in llFrias et all . Il997tl 



14 This parallel between attribute sets ordered by inclu sion and projection functions ordered 
by injectivity is dealt with in detail in ifolivcira, 201lj). Note how ! mimics the empty set 
and id mimics the whole set of attributes, enabling one to "see the whole thing" and thus 
discriminating as much as possible. 
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A first illustration of this unified framework is given below: the (generic) 
calculation of the so-called Armstrong axioms F3 (Additivity) and F4 ( Projec- 
tivitv) B This is done in one go, for arbitrary (suitably typed) R, f,a,hV% 



Calculation: 



gh^^f = g+J^f A h^—f (27) 



gh^f 

{ (j 131) ; expansion of shorthand gh } 
g nh< f -R° 

{ universal property of m (|26[) } 
g<f-R° A h<f-R° 

{ (JT3J twice } 



«_/ A h+2-f 



The type rule for the database join operator (n) is calculated in the same way: 

g^—f A h^—f 

=> { let 7ri(y,x) = y and H2{y,x) = s; FDs are downward closed } 

■m-(RKS) tt2-(RmS) 

g^ / a h-* / 

= { trading (|19(l twice } 

BmS . , BmS „ 
g ■ TTl ■< / A h-TT 2 -* / 

= { F3+F4 (HZ} } 

(g ■ tti) n (h ■ tt 2 ) <" M / 
= { product of functions: / x g — (/ ■ 7i"i) n (5 • 712) } 

5 x / 



7 Beyond the type system: query optimization 

As explained above, FD theory (cf. Hoare logic) can be regarded as a type system 
whose rules help in reasoning about data models (cf. programs) without going 

15 See llMaieJ,ll983l) . 

16 In the Hoare logic counterpart of this rule gh will be the conjunction of two assertions. 
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into the semantic intricacies of data business rules (cf. program meanings). It 
helps because quantified expressions such as in Definition Q] don't scale up very 
well to large sets of dependencies. In this respect, our quantifier-free equivalent 
(fT5|) looks more tractable and is therefore expected to be calculationally effective 
where the quantified equivalent is clumsy. 



This will be illustrated bel ow with a s i mple example, taken from lAbiteboul et al 



(1995) and also addressed by|W isnes 3 (|2012h : one wants to optimize the con 



junctive query 

{(d, a')\t = t! , (t, d, a) £ Movies, (t\ d! , a') e Movies} (28) 

over a database file Movies (Title, Director, Actor) into a query accessing this 

file only once, knowing that FD Title »- Director holds. 

Put in calculational format and abbreviating M for Movies, t for Title, d 
for Director and a for Actor, we want to solve for X the equation 

d-M ■ (ker t) ■ M ■ a° = X (29) 

whose left hand side is the relational equivalent of ([28)) Our aim is to obtain 
a solution X containing only one instance of M . The equation is solved by 
taking the FD itself as starting point and trying to re-write it into something 
one recognizes as an instance of (|29|) : 

d^t 



{ m } 

d<t-M° 

= { expanding (|11I12[) ; M° = M since M is a set } 

M ■ t° ■ t ■ M C d° ■ d 
= { composition (-M) with a set (partial identity) is a closure operator } 

M -t° -t-M Cd° -d-M 

{ shunting (|16I17|I ; monotonicity of (-a°); kernel pip } 

d-M ■ (ker t) ■ M ■ a° C d ■ M ■ a° 

Thus we find d-M ■ a as a candidate solution for X. To obtain X — d ■ M ■ a° 



17 As the interested reader may check by introducing the variables back. Note how ker t 
expresses t = t' and projection functions d (for Director) and a (for Actor) work over tuple 
it,d,a) and tuple (t',d',a'), respectively. The use of the same letters for data variables and 
the corresponding projection functions should help in tallying the two versions of the query. 
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it remains to check the converse inclusion: 



d ■ M ■ a° C d ■ M ■ (ker t) ■ M ■ a° 

{ id C ker t because kernels are equivalence relations } 

d ■ M ■ a° C d ■ M ■ M ■ a° 
= { M-M — MnM — M because M is a set } 

d ■ M ■ a° C d ■ M ■ a 

Thus X = d ■ M -a°, that is 

X = {{d,a!) | (t,d,a') 6 Movies} 

is the solution to equation (|29l) which optimizes the given query by only visiting 
the movies file once p. 

8 Conclusions and future work 

"The great merit of algebra is as a powerful tool for exploring family re- 
lationships over a wide range of different theories. (...) It is only their 
algebraic properties that emphasise the family likenesses (...) Algebraic 
proofs by term rewriting are the most promising way in which computers 
can assist in the process of reliable design. " 

iHoare and Jifenel J1998I ") 

There is growing interest in applying abstract algebra techniques in computer 
science as a way to promote calculation in software engineering. Moreover, al- 
gebraic structures such as idempotent semirings and Kleene algebras (which re- 
lation algebra is an i nstan c e of) have been show n to be amenable to automation 
( Hofner and Struthl . 2007 ). Moller et al. ( 2012 ). for instance, encode a database 



preference theory into idempotent semiring algebra and show how to use Prover9 
to discharge proofs. Model checking in tools such as eg. the Alloy Analy s er als o 



blends well with quantifier- free relational models (jOliveira and Ferreiral . I2012T ) . 

Abstract algebra has the power to unify seemingly disparate theories once 
they are encoded into the same abstract terms. In the current paper we have 
shown how a relational rendering of both Hoare logic and data dependency 
theory purports one such unification, in spite of the former being an algorithmic 
theory and the latter a data theory, as both algorithms and data structures unify 
into binary relations. 



Other such unifications could be devised. For instance, iMili et al. I (119851 ) 



reason about while-loops w = (while t do b) in terms of so-called strongest 
invariant functions, where invariant functions /, ordered by injectivity, are such 
that /•[*] = /•&•[£] holds. A simple argument in relation algebra shows this 

b-[t] 

equivalent to / • b ■ [<] C /, thus entailing FD / -* / . 



By the way: symmetry between a and d in calculation step d- M • t° ■ t ■ M ■ a° C d- M -a° 

M 

above immediately tells that FD a -< t would also enable the proposed optimization. 
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On a more practical register, our algebraic framework makes it possible to 
type-check database operations and optimize queries by calculation once they 
are written as Tarskian, qua ntifier-free formu las. We would like to investigate 
this further in connection to IWisneskvl (|2012l )'s point-free query compiler. 



Back to the opening story, surely Tarski's work on satisfaction and truth is 
relevant to computer science. But Etchemendy's answer could have been better 
tuned to the particular context of database technology suggested by the Oracle 
towers landscape: 

[...] "They would never have been built without Tarski's work on the 
calculus of binary relations. " 
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